Goto

Collaborating Authors

 ultrasound imaging


Auto-US: An Ultrasound Video Diagnosis Agent Using Video Classification Framework and LLMs

Yang, Yuezhe, Guo, Yiyue, Cai, Wenjie, Ruan, Qingqing, Wang, Siying, Dong, Xingbo, Jin, Zhe, Dai, Yong

arXiv.org Artificial Intelligence

AI-assisted ultrasound video diagnosis presents new opportunities to enhance the efficiency and accuracy of medical imaging analysis. However, existing research remains limited in terms of dataset diversity, diagnostic performance, and clinical applicability. In this study, we propose \textbf{Auto-US}, an intelligent diagnosis agent that integrates ultrasound video data with clinical diagnostic text. To support this, we constructed \textbf{CUV Dataset} of 495 ultrasound videos spanning five categories and three organs, aggregated from multiple open-access sources. We developed \textbf{CTU-Net}, which achieves state-of-the-art performance in ultrasound video classification, reaching an accuracy of 86.73\% Furthermore, by incorporating large language models, Auto-US is capable of generating clinically meaningful diagnostic suggestions. The final diagnostic scores for each case exceeded 3 out of 5 and were validated by professional clinicians. These results demonstrate the effectiveness and clinical potential of Auto-US in real-world ultrasound applications. Code and data are available at: https://github.com/Bean-Young/Auto-US.


Pediatric Appendicitis Detection from Ultrasound Images

Hosseinabadi, Fatemeh, Sharifi, Seyedhassan

arXiv.org Artificial Intelligence

Pediatric appendicitis remains one of the most common causes of acute abdominal pain in children, and its diagnosis continues to challenge clinicians due to overlapping symptoms and variable imaging quality. This study aims to develop and evaluate a deep learning model based on a pretrained ResNet architecture for automated detection of appendicitis from ultrasound images. We used the Regensburg Pediatric Appendicitis Dataset, which includes ultrasound scans, laboratory data, and clinical scores from pediatric patients admitted with abdominal pain to Children Hospital. Hedwig in Regensburg, Germany. Each subject had 1 to 15 ultrasound views covering the right lower quadrant, appendix, lymph nodes, and related structures. For the image based classification task, ResNet was fine tuned to distinguish appendicitis from non-appendicitis cases. Images were preprocessed by normalization, resizing, and augmentation to enhance generalization. The proposed ResNet model achieved an overall accuracy of 93.44, precision of 91.53, and recall of 89.8, demonstrating strong performance in identifying appendicitis across heterogeneous ultrasound views. The model effectively learned discriminative spatial features, overcoming challenges posed by low contrast, speckle noise, and anatomical variability in pediatric imaging.


Dolphin v1.0 Technical Report

Weng, Taohan, Hu, Kaibing, Liu, Henan, Liu, Siya, Liu, Xiaoyang, Liu, Zhenyu, Ren, Jiren, Wang, Boyan, Wang, Boyang, Wang, Yiyu, Wu, Yalun, Yan, Chaoran, Yan, Kaiwen, Yu, Jinze, Zhang, Chi, Zhang, Duo, Zheng, Haoyun, Guo, Xiaoqing, Souquet, Jacques, Guo, Hongcheng, Le, Anjie

arXiv.org Artificial Intelligence

Ultrasound is crucial in modern medicine but faces challenges like operator dependence, image noise, and real-time scanning, hindering AI integration. While large multimodal models excel in other medical imaging areas, they struggle with ultrasound's complexities. To address this, we introduce Dolphin v1.0 (V1) and its reasoning-augmented version, Dolphin R1-the first large-scale multimodal ultrasound foundation models unifying diverse clinical tasks in a single vision-language framework.To tackle ultrasound variability and noise, we curated a 2-million-scale multimodal dataset, combining textbook knowledge, public data, synthetic samples, and general corpora. This ensures robust perception, generalization, and clinical adaptability.The Dolphin series employs a three-stage training strategy: domain-specialized pretraining, instruction-driven alignment, and reinforcement-based refinement. Dolphin v1.0 delivers reliable performance in classification, detection, regression, and report generation. Dolphin R1 enhances diagnostic inference, reasoning transparency, and interpretability through reinforcement learning with ultrasound-specific rewards.Evaluated on U2-Bench across eight ultrasound tasks, Dolphin R1 achieves a U2-score of 0.5835-over twice the second-best model (0.2968) setting a new state of the art. Dolphin v1.0 also performs competitively, validating the unified framework. Comparisons show reasoning-enhanced training significantly improves diagnostic accuracy, consistency, and interpretability, highlighting its importance for high-stakes medical AI.


Tactile-Guided Robotic Ultrasound: Mapping Preplanned Scan Paths for Intercostal Imaging

Zhang, Yifan, Huang, Dianye, Navab, Nassir, Jiang, Zhongliang

arXiv.org Artificial Intelligence

Medical ultrasound (US) imaging is widely used in clinical examinations due to its portability, real-time capability, and radiation-free nature. To address inter- and intra-operator variability, robotic ultrasound systems have gained increasing attention. However, their application in challenging intercostal imaging remains limited due to the lack of an effective scan path generation method within the constrained acoustic window. To overcome this challenge, we explore the potential of tactile cues for characterizing subcutaneous rib structures as an alternative signal for ultrasound segmentation-free bone surface point cloud extraction. Compared to 2D US images, 1D tactile-related signals offer higher processing efficiency and are less susceptible to acoustic noise and artifacts. By leveraging robotic tracking data, a sparse tactile point cloud is generated through a few scans along the rib, mimicking human palpation. To robustly map the scanning trajectory into the intercostal space, the sparse tactile bone location point cloud is first interpolated to form a denser representation. This refined point cloud is then registered to an image-based dense bone surface point cloud, enabling accurate scan path mapping for individual patients. Additionally, to ensure full coverage of the object of interest, we introduce an automated tilt angle adjustment method to visualize structures beneath the bone. To validate the proposed method, we conducted comprehensive experiments on four distinct phantoms. The final scanning waypoint mapping achieved Mean Nearest Neighbor Distance (MNND) and Hausdorff distance (HD) errors of 3.41 mm and 3.65 mm, respectively, while the reconstructed object beneath the bone had errors of 0.69 mm and 2.2 mm compared to the CT ground truth.


EdgeSRIE: A hybrid deep learning framework for real-time speckle reduction and image enhancement on portable ultrasound systems

Cho, Hyunwoo, Lee, Jongsoo, Kang, Jinbum, Yoo, Yangmo

arXiv.org Artificial Intelligence

Speckle patterns in ultrasound images often obscure anatomical details, leading to diagnostic uncertainty. Recently, various deep learning (DL)-based techniques have been introduced to effectively suppress speckle; however, their high computational costs pose challenges for low-resource devices, such as portable ultrasound systems. To address this issue, EdgeSRIE, which is a lightweight hybrid DL framework for real-time speckle reduction and image enhancement in portable ultrasound imaging, is introduced. The proposed framework consists of two main branches: an unsupervised despeckling branch, which is trained by minimizing a loss function between speckled images, and a deblurring branch, which restores blurred images to sharp images. For hardware implementation, the trained network is quantized to 8-bit integer precision and deployed on a low-resource system-on-chip (SoC) with limited power consumption. In the performance evaluation with phantom and in vivo analyses, EdgeSRIE achieved the highest contrast-to-noise ratio (CNR) and average gradient magnitude (AGM) compared with the other baselines (different 2-rule-based methods and other 4-DL-based methods). Furthermore, EdgeSRIE enabled real-time inference at over 60 frames per second while satisfying computational requirements (< 20K parameters) on actual portable ultrasound hardware. These results demonstrated the feasibility of EdgeSRIE for real-time, high-quality ultrasound imaging in resource-limited environments.


High-Fidelity Functional Ultrasound Reconstruction via A Visual Auto-Regressive Framework

Chen, Xuhang, Li, Zhuo, Shen, Yanyan, Mahmud, Mufti, Pham, Hieu, Pun, Chi-Man, Wang, Shuqiang

arXiv.org Artificial Intelligence

-- Functional ultrasound (fUS) imaging provides exceptional spatiotemporal resolution for neurovascular mapping, yet its practical application is significantly hampered by critical challenges. Foremost among these are data scarcity, arising from ethical considerations and signal degradation through the cranium, which collectively limit dataset diversity and compromise the fairness of downstream machine learning models. To address these limitations, we introduce UltraVAR (Ultrasound Visual Auto-Regressive model), the first data augmentation framework designed for fUS imaging that leverages a pre-trained visual auto-regressive generative model. UltraVAR is designed not only to mitigate data scarcity but also to enhance model fairness through the reconstruction of diverse and physiologically plausible fUS samples. The generated samples preserve essential neurovascular coupling features--specifically, the dynamic interplay between neural activity and microvascular hemodynamics. This capability distinguishes UltraVAR from conventional augmentation techniques, which often disrupt these vital physiological correlations and consequently fail to improve, or even degrade, downstream task performance. The proposed UltraVAR employs a scale-by-scale reconstruction mechanism that meticulously preserves the spatial topological relationships within vascular networks. The framework's fidelity is further enhanced by two integrated modules: the Smooth Scaling Layer, which ensures the preservation of critical image information during multi-scale feature propagation, and the Perception Enhancement Module, which actively suppresses artifact generation via a dynamic residual compensation mechanism. Comprehensive experimental validation demonstrates that datasets augmented with UltraVAR yield statistically significant improvements in downstream classification accuracy. This work estab-This work was supported by National Natural Science Foundations of China under Grant 62172403, 12326614.


Ultrasound-Guided Robotic Blood Drawing and In Vivo Studies on Submillimetre Vessels of Rats

Jing, Shuaiqi, Yao, Tianliang, Zhang, Ke, Wu, Di, Wang, Qiulin, Chen, Zixi, Chen, Ke, Qi, Peng

arXiv.org Artificial Intelligence

Billions of vascular access procedures are performed annually worldwide, serving as a crucial first step in various clinical diagnostic and therapeutic procedures. For pediatric or elderly individuals, whose vessels are small in size (typically 2 to 3 mm in diameter for adults and less than 1 mm in children), vascular access can be highly challenging. This study presents an image-guided robotic system aimed at enhancing the accuracy of difficult vascular access procedures. The system integrates a 6-DoF robotic arm with a 3-DoF end-effector, ensuring precise navigation and needle insertion. Multi-modal imaging and sensing technologies have been utilized to endow the medical robot with precision and safety, while ultrasound imaging guidance is specifically evaluated in this study. To evaluate in vivo vascular access in submillimeter vessels, we conducted ultrasound-guided robotic blood drawing on the tail veins (with a diameter of 0.7 plus or minus 0.2 mm) of 40 rats. The results demonstrate that the system achieved a first-attempt success rate of 95 percent. The high first-attempt success rate in intravenous vascular access, even with small blood vessels, demonstrates the system's effectiveness in performing these procedures. This capability reduces the risk of failed attempts, minimizes patient discomfort, and enhances clinical efficiency.


Robotic CBCT Meets Robotic Ultrasound

Li, Feng, Bi, Yuan, Huang, Dianye, Jiang, Zhongliang, Navab, Nassir

arXiv.org Artificial Intelligence

The multi-modality imaging system offers optimal fused images for safe and precise interventions in modern clinical practices, such as computed tomography - ultrasound (CT-US) guidance for needle insertion. However, the limited dexterity and mobility of current imaging devices hinder their integration into standardized workflows and the advancement toward fully autonomous intervention systems. In this paper, we present a novel clinical setup where robotic cone beam computed tomography (CBCT) and robotic US are pre-calibrated and dynamically co-registered, enabling new clinical applications. This setup allows registration-free rigid registration, facilitating multi-modal guided procedures in the absence of tissue deformation. First, a one-time pre-calibration is performed between the systems. To ensure a safe insertion path by highlighting critical vasculature on the 3D CBCT, SAM2 segments vessels from B-mode images, using the Doppler signal as an autonomously generated prompt. Based on the registration, the Doppler image or segmented vessel masks are then mapped onto the CBCT, creating an optimally fused image with comprehensive detail. To validate the system, we used a specially designed phantom, featuring lesions covered by ribs and multiple vessels with simulated moving flow. The mapping error between US and CBCT resulted in an average deviation of 1.72+-0.62 mm. A user study demonstrated the effectiveness of CBCT-US fusion for needle insertion guidance, showing significant improvements in time efficiency, accuracy, and success rate. Needle intervention performance improved by approximately 50% compared to the conventional US-guided workflow. We present the first robotic dual-modality imaging system designed to guide clinical applications. The results show significant performance improvements compared to traditional manual interventions.


Deep Sylvester Posterior Inference for Adaptive Compressed Sensing in Ultrasound Imaging

Penninga, Simon W., van Gorp, Hans, van Sloun, Ruud J. G.

arXiv.org Artificial Intelligence

Ultrasound images are commonly formed by sequential acquisition of beam-steered scan-lines. Minimizing the number of required scan-lines can significantly enhance frame rate, field of view, energy efficiency, and data transfer speeds. Existing approaches typically use static subsampling schemes in combination with sparsity-based or, more recently, deep-learning-based recovery. In this work, we introduce an adaptive subsampling method that maximizes intrinsic information gain in-situ, employing a Sylvester Normalizing Flow encoder to infer an approximate Bayesian posterior under partial observation in real-time. Using the Bayesian posterior and a deep generative model for future observations, we determine the subsampling scheme that maximizes the mutual information between the subsampled observations, and the next frame of the video. We evaluate our approach using the EchoNet cardiac ultrasound video dataset and demonstrate that our active sampling method outperforms competitive baselines, including uniform and variable-density random sampling, as well as equidistantly spaced scan-lines, improving mean absolute reconstruction error by 15%. Moreover, posterior inference and the sampling scheme generation are performed in just 0.015 seconds (66Hz), making it fast enough for real-time 2D ultrasound imaging applications.


Ensemble Learning for Microbubble Localization in Super-Resolution Ultrasound

Gharamaleki, Sepideh K., Helfield, Brandon, Rivaz, Hassan

arXiv.org Artificial Intelligence

Super-resolution ultrasound (SR-US) is a powerful imaging technique for capturing microvasculature and blood flow at high spatial resolution. However, accurate microbubble (MB) localization remains a key challenge, as errors in localization can propagate through subsequent stages of the super-resolution process, affecting overall performance. In this paper, we explore the potential of ensemble learning techniques to enhance MB localization by increasing detection sensitivity and reducing false positives. Our study evaluates the effectiveness of ensemble methods on both in vivo and simulated outputs of a Deformable DEtection TRansformer (Deformable DETR) network. As a result of our study, we are able to demonstrate the advantages of these ensemble approaches by showing improved precision and recall in MB detection and offering insights into their application in SR-US.